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Abstract I consider techniques for Berger-OIiger adaptive mesh refinement (AMR) when 
numerically solving partial differential equations with wave-like solutions, using charac- 
teristic (double-null) grids. Such AMR algorithms are naturally recursive, and the best- 
known past Berger-OIiger characteristic AMR algorithm, that of Pretorius & Lehner (J. 
Comp. Phys. 198 (2004), 10), recurses on individual "diamond" characteristic grid cells. 
This leads to the use of fine-grained memory management, with individual grid cells kept 
in 2-dimensional linked lists at each refinement level. This complicates the implementation 
and adds overhead in both space and time. Here I describe a Berger-OIiger characteristic 
AMR algorithm which instead recurses on null slices. This algorithm is very similar to the 
usual Cauchy Berger-OIiger algorithm, and uses relatively coarse-grained memory manage- 
ment, allowing entire null slices to be stored in contiguous arrays in memory. The algorithm 
is very efficient in both space and time. I describe discretizations yielding both 2nd and 
4th order global accuracy. My code implementing the algorithm described here is included 
in the electronic supplementary materials accompanying this paper, and is freely available 
to other researchers under the terms of the GNU general public license. 

PACS 04.25.Dm, 02.70.-c, 02.70.Bf, 02.60.Lj 

Keywords adaptive mesh refinement, finite differencing, characteristic coordinates, 
characteristic grids, Berger-OIiger algorithm 

This paper is dedicated to the memory of Thomas Radke, my late friend, colleague, 
and partner in many computational adventures. 



1 Introduction 



Adaptive mesh refinement (AMR) algorithms are now a vital part of computational sci- 
ence and are particularly valuable in the numerical solution of partial differential equations 
(PDEs) whose solutions have a wide dynamic range across the problem domain. Here I fo- 
cus on explicit finite difference methods and PDEs which have propagating- wave solutions. 
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The most powerful and general AMR algorithms for problems of this type are those based 
on the pioneering work of Berger and Oliger (1984) (see also Berger (1982, 1986); Berger 
and Colella (1989)). These algorithms use locally uniform grids, refined in space and time 
as needed, with fine grids (which generally cover only a small part of the problem domain) 
overlaying coarse grids. At each time step, coarse grids are integrated first and spatial bound- 
ary conditions for fine-grid integrations are obtained by time-interpolation from the coarse 
grids. This whole process is applied recursively at each of the possibly-many levels of mesh 
refinement. 

Berger and Oliger's original work, as well as most following work, used Cauchy-type 
coordinates and grids, where initial data is given on a spacelike hypersurface and the solu- 
tion is then computed one spacelike sUce at a time within a numerical problem domain with 
(typically) timelike boundaries. For problems where the propagating- wave PDEs are natu- 
rally posed on an infinite domain, these finite-domain timelike boundaries require radiation 
boundary conditions. For many problems of interest these boundary conditions can only be 
approximate, and for the Einstein equations or similar constrained PDE systems they may 
render the evolution system ill-posed, generate significant boundary reflections, and/or gen- 
erate significant constraint violations. In practice it's often difficult and/or computationally 
expensive to reduce these boundary-condition errors to an acceptably low level.' 

As an alternative to Cauchy formulations, here I consider characteristic formulations, 
where the numerical problem domain's boundaries are null geodesies. This makes it very 
easy to impose boundary conditions on the continuum PDEs in a well-posed and constraint- 
preserving manner, and to approximate these boundary conditions very accurately in the 
finite differencing scheme. While Cauchy-type AMR is now widely used in numerical rel- 
ativity, and characteristic formulations are also not uncommon, there has been much less 
study of Berger-Oliger AMR using characteristic formulations. This is the topic of this pa- 
per. 

The best-known work on Berger-Oliger characteristic AMR is that of Pretorius and 
Lehner (2004), who describe an algorithm which treats the two null coordinates symmet- 
rically, and whose fundamental unit of recursion is the "diamond" double-null characteristic 
grid cell. This leads to their code using very fine-grained memory management, with each 
individual grid point at each refinement level containing Unked-list pointers to its neighbor- 
ing grid points in each null direction. This makes the programming more complicated and 
adds some space and time overhead. Their algorithm has ff{A^) global accuracy, where A 
is the grid resolution. 

In contrast, the AMR algorithm I describe here is much closer to the earlier work of 
Hamade and Stewart (1996), treating the two null coordinates asymmetrically and only re- 
cursing on null slices. (In Cauchy-evolution terms, the slice-recursion algorithm treats one 
null coordinate as a "time" coordinate labelling null slices and the other as a "space" coor- 
dinate labelling events on a null slice.) My algorithm uses relatively coarse-grained memory 
management, with all the grid points in a single null slice level stored in a single set of 
arrays which can easily be stored contiguously in memory. This leads to relatively sim- 
ple programming with only a small loss of efficiency from the coarser-grained adaptivity. I 
describe finite differencing schemes and interpolation operators which yield ^(4^) global 
accuracy, as well as the usual ff{A^). By using C++ templates, my code is able to support 
both cases with no nm-time overhead. 

' See Givoli (1991) for a general review of numerical radiation boundary conditions, and Kidder et al. 
(2005); Rinne (2006); Buchman and Sarbach (2006, 2007); Rinne et al. (2007); Ruiz et al. (2007); Seller et al 
(2008); Rinne et al. (2009) for recent progress towards non-reflecting and constraint-preserving radiation 
boundary conditions for the Einstein equations. 
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To demonstrate the slice-recursion AMR algorithm I use a simple model problem, the 
spherically symmetric real or complex scalar wave equation on a Schwarzschild-spacetime 
background, with a time-dependent Dirac 5 -function source term. This problem is generally 
representative of a wide range of black-hole perturbation problems and, more generally, of 
PDEs where characteristic AMR algorithms may be appropriate. 

The remainder of this paper is organized as follows: the remainder of this section out- 
lines the notation used in this paper. Section 2 describes the model problem. Section 3 gives 
a brief outline of the unigrid finite differencing schemes I use for globally 2nd and 4th order 
accuracy. Section 4 describes how the local truncation error of the finite differencing scheme 
can be estimated. Section 5 describes the slice-recursion AMR algorithm and compares it 
to other Cauchy and characteristic Berger-Oliger algorithms. Section 6 presents tests of the 
AMR algorithm to demonstrate that it is accurate and efficient. Section 7 draws general con- 
clusions. Appendix A gives a detailed description of the unigrid finite differencing schemes 
I use. Appendix B discusses some implementation aspects of the AMR algorithm. 



1.1 Notation 

I generally follow the sign and notation conventions of Wald (1984), with a (— ,-|-,-|-,-|-) 
metric signature. I use the Penrose abstract-index notation, with Latin indices abc running 
over spacetime coordinates. V^, is the covariant derivative operator associated with the 4- 
metric. 

I use upper-case sans-serif letters A, B, C, . . . to label grid points and (in section 5 and 
appendix A) finite difference grids. I describe my notation for finite difference grids in de- 
tail in section 5.1. I use Small Capitals for the names of software packages and (in 
appendix B.4) major data structures in my AMR code, [x] denotes the smallest integer > x. 

I use a pseudocode notation to describe algorithms: Lines are numbered for reference, 
but the line numbers are not used in the algorithm itself. # marks comment lines, while 
keywords are typeset in bold font and most variable names in typewriter font (a few 
variable names are mathematical symbols, such as "i?max")- "X Y" means that the variable 
X is assigned the value of the expression Y. Variables are always declared before use. The 
declaration of a variable explicitly states the variable's type and may also be combined 
with the assignment of an initial value, as in "integer j -(—0". The looping construct "for 
integer X from A to S by C" is inspired by BASIC but also includes a declaration of the loop 
variable (with scope limited to the loop body, as in C++ and Perl). The looping semantics are 
the same as Fortran's "do X = A, B, C", with the increment C defaulting to 1 if omitted. 
Conditional statements use PL/I-inspired syntax (if-then-else). { and } delimit the scope 
of procedures, loop bodies, and either of the branches of conditional statements. Procedures 
(subroutines) are marked with the keyword procedure, and are explicitly invoked with a call 
statement. Procedure names are typeset in typewriter font. When referring to a procedure 
as a noun in a figure caption or in the main text of this paper, the procedure name is suffixed 
with"()", asin"foo()". 



2 Model Problem 

The basic AMR algorithm presented here is quite general, but for ease of exposition I present 

it in the context of a simple model problem. This model problem derives from the calculation 
of the radiation-reaction "self-force" on a scalar particle orbiting a Schwarzschild black hole. 
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but for purposes of this paper the model problem may be considered by itself, divorced from 
its physical context. 

Thus, consider Schwarzschild spacetime of mass M and introduce ingoing and outgoing 
null coordinates u and v respectively, so the line element is 

ds^ = -f{r)dudv + r^dQ^, (1) 

where r is (thus defined to be) the usual areal radial coordinate, f{r) = 1 — 2M/r, and dii^ 
is the line element on a 2-sphere of constant r. It's also useful to define the Schwarzschild 
time coordinate ^schw = j + «) and the "tortise" radial coordinate 

r* = i(v-M) = r + 2Mlog[^-l| . (2) 

In this paper I only consider the region outside the event horizon, r > 2M, so the coordinates 

fschwi and are always nonsingular, ?schw is always timelike, and both r and are always 
spacelike. My computational scheme requires numerically inverting (2) to obtain r(r*); 1 
discuss this inversion in appendix B.l. 

The model problem is the spherically S5mmetric scalar wave equation on this back- 
ground spacetime, with a time-dependent Dirac 5-function source term (stationary in space), 

D(l>+V(r)^ = ^ + y = 5(rschw) 5 (r - rp) , (3) 
auav 

where □ = V"Va is the usual curved-space wave operator, is a real or complex scalar 
field, rp > 2M is a specified "particle" radius giving the spatial position of the source-term 
worldline, V{r) is a specified (smooth) position-dependent potential which varies on a typ- 
ical spatial scale > M and which vanishes at spatial infinity, and 5(;schw) is a specified 
time-dependent (typically highly-oscillatory) source term defined along the source world- 
line r = rp. 

1 define || ■ || to be a pointwise norm on the scalar field (j). For reasons discussed in 
section 4.2, in the complex-scalar-field case || ■ || should be the complex magnitude rather 
than (say) the Li norm = |Re[(^] | -|- |lm[(^] | , even though the latter is slightly cheaper 
to compute. 

As shown in figure 1, the problem domain is a square in (v.m) space, (v,m) e [vmin,Vmax] x 
[Mniin,Mmax]- 1 take the particle worldline r = to symmetrically bisect the domain. For a 
given domain, it's convenient to introduce "relative v" (rv) and "relative m" (ru) coordinates 
rv = V — Vmin and ru = m — m^ji, respectively. 

Initial data must be specified along the "southwest" and "southeast" faces of the domain, 
V = Vmin and u = Mmin respectively. The "northwest" and "northeast" faces of the problem 
domain, u = Mmax and v = Vmax respectively, are ingoing null characteristics with respect to 
the problem domain, so no boundary conditions need be posed there. This is a key advan- 
tage of a characteristic evolution scheme. In contrast, using a Cauchy evolution scheme an 
outgoing-radiation boundary condition is generally required on each timelike boundary. 

Assuming smooth initial data on the southwest and southeast grid faces, (j) is C°° every- 
where in the problem domain except at the particle worldline. In practice, (j) and its spacetime 
gradients display high dynamic ranges across the problem domain, varying rapidly near the 
particle worldline but only slowly far from the worldUne. This makes a unigrid scheme quite 
inefficient and is the primary motivation for using AMR. 

Because of the 5-function source term, at the particle worldline ^ is generically only C\ 
i.e., (j) is continuous but its gradient generically has a jump discontinuity across the particle 
worldline. Any finite differencing or interpolation operators which include grid points on 
both sides of the particle worldline must take the discontinuity into account. 
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3 Unigrid Finite Differencing 

The fovmdation of any mesh-refinement scheme is a stable and locally consistent unigrid 

discretization. To describe this, I introduce a uniform finite-difference grid over the prob- 
lem domain, with equal grid spacing A in v and u. I use j and i as the integer grid-point 
coordinates corresponding to v and u respectively, 

V;=Vnun + 7'^ (4a) 
Ui = Mmin + i'^ , (4b) 

where j and range from QtoN = D/ A inclusive, where D = v^ax — Vmin = Wmax — Wmin is the 
problem-domain size. As is common in finite-differencing computations, I use the notation 
that subscripting a grid function denotes its value at the specified grid point, for example 

^j,i = ^{v=Vj,u=Ui). (5) 

To simplify the finite differencing near the particle worldline, I require that the grid be 
placed such that if the particle worldline passes through a grid cell, it does so symmetrically, 
bisecting through the center of the cell.^ This assumption considerably simplifies the finite 
differencing near the particle worldline, and makes it easy to represent the effects of the 5- 
function source term accurately. (See Tomberg and Engquist (2004) for a general discussion 
of the numerical treatment of 5-function terms in PDEs.) 

The finite difference schemes I consider here are all explicit, with molecules summa- 
rized in figure 2. Briefly, for 2nd order global accuracy I use a standard diamond-cell inte- 
gration scheme (Gomez and Winicour (1992); Gomez etal. (1992); Gundlach etal. (1994); 
Burko and On (1997); Lousto and Price (1997); Lousto (2005); Winicour (2009)), while 
for 4th order global accuracy I use a modified version of the scheme described by Lousto 
(2005); Haas (2007). I describe the finite differencing schemes in detail in appendix A. With 
one exception discussed in appendix A.3, these finite differencing schemes are stable. 

^ This symmetric passage is only possible because r,, is time-independent. For the more generic case 
where r,, is time-dependent, then in general the particle worldline would pass obliquely through the cell. As 
discussed by, for example, Martel and Poisson (2002); Lousto (2005); Haas (2007), this would considerably 
complicate the finite differencing of cells intersecting the particle worldline. However, it wouldn't alter the 
overall character of the AMR algorithm. 




Fig. 1 This figure shows the overall problem domain, and the (m,v) and (fsciiw,'"*) coordinates. The vertical 
dashed line marks the particle worldline. 
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It's important to know the domain of dependence of the finite difference computation of 
(jij^i, i.e., the set of grid points {j+P, i+cc) where 0y_|_|3 is used as an input in computing 
^jj. (I define the "radius" of such a finite difference computation as the maximum of all 
such 1/3 1 and \a\, the radius in the j direction as the maximum of all such |jS|, and the 
radius in the / direction as the maximum of ail such |a|.) For 2nd order global accuracy, 
the domain of dependence is precisely that shown in figure 2, i.e., it comprises the 3 grid 
points For 4th order global accuracy, this set depends on 

the position of {j, i) relative to the particle worldbne, but it never includes points outside the 
4 slices {j,j—l,j—2,j—3} or outside the 4 u positions {i,i—l,i—2,i—3}. 

Given this domain of dependence, there are many possible orders in which the may 
be computed in a unigrid integration. However, the algorithms I consider here all integrate 
the grid points in "raster-scan" order, i.e., using an outer loop over j (so that each iteration 
of the outer loop integrates a single v = constant null slice) and an inner loop over /. 



3.1 Local versus Global Truncation Errors for Characteristic Schemes 

Recall the (standard) definitions of the local and global truncation error of a finite differ- 
encing scheme (Kreiss and Oliger (1973); Choptuik (1991); Richtmyer and Morton (1994); 
LeVeque (2007)): The local truncation error (LTE) is a pointwise norm of the discrepancy 
that results when the exact solution of the PDF is substituted into the finite difference equa- 
tions at a grid point. The global truncation error (GTE) is a pointwise norm of the difference 
between the exact solution of the PDF and the result of solving the finite difference equa- 
tions using exact arithmetic (i.e., without floating-point roundoff errors). The LTE and GTE 
are both grid functions. 

For a stable and consistent Cauchy evolution scheme, the GTE and LTE are normally 
of the same order in the grid spacing A (Kreiss and Oliger (1973); Choptuik (1991); Richt- 
myer and Morton (1994); LeVeque (2007)). However, in a characteristic evolution, errors 
can build up cumulatively over many grid points, so the GTE is generically worse than the 
LTE. For a l-M (space 4- time) dimensional problem such as that considered here, errors can 
accumulate over ff(N^) grid points where N = ff{\/A), so generically an LTE is 

required to guarantee an ^(4") GTE. Corresponding to this, the finite differencing schemes 



2nd order global accuracy 4th order global accuracy 



\ 



O . / ' ^ o ^ 

• • • • 

O point {j, i) where <l>jf is computed 

• point where <t> is used as an input in computing (fijf 



Fig. 2 This figure summarizes the unigrid finite differencing molecules. The left subfigure shows the 
molecule used for 2nd order global accuracy. The right subfigure shows the molecule used for 4th order 
global accuracy far from the particle worldline; near the worldhne the molecules are more compUcated, and 
are shown in figure 10. Open circles show the point where ^ is being computed; soUd circles show points 
where (j) is used as an input in the computation of (pjj. 
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I present here for 2nd and 4th order global accuracy (GTE) actually have 4th and 6th order 
LTE respectively, except near the particle worldline where one order lower GTE is accept- 
able because there are only &[N) near- worldline grid cells. 



3.2 Initial Data 

The globally-4th-order finite differencing scheme summarized in figure 2 and described in 
detail in appendix A. 2 is a 4-level scheme: the finite-difference molecule for computing 
^j^i includes points on 3 previous slices and at 3 previous u positions. Physical boimdary 
conditions are (only) given on the southeast and southwest grid boundaries, so starting the 
numerical integration requires that "extended initial data" somehow be computed, compris- 
ing the physical boundary data, the next 2 slices after the southwest grid boundary, and the 
next 2 grid points on each succeeding slice after the southeast grid boundary. I describe 
several different schemes for constructing the extended initial data in appendix A.2.4. 



4 Estimation of tlie Local Truncation Error 

As is common in AMR algorithms, I implement the "adaptive" part of AMR using an esti- 
mate of the numerical solution's local truncation error (LTE). I compute this via the standard 
technique (LeVeque (2007, section A.6)) of comparing the main numerical integration with 
that from a coarser-resolution integration. In the context of Berger-Oliger style AMR, where 
a hierarchy of grids are being integrated concurrently at varying resolutions, there are sev- 
eral possible choices for the coarser-resolution comparison solution. It can come from a 
separate "shadow" AMR hierarchy (each level of which is coarser than the corresponding 
level of the main AMR hierarchy), or it can come from the next-coarsest level of the (single) 
AMR hierarchy (the "self-shadow hierarchy" technique of Pretorius (2002b); Pretorius and 
Lehner (2004)). 

Here I use a different technique, also used by Hamade and Stewart (1996), where the 
coarser-resolution comparison is obtained locally within the (unigrid) evolution at each level 

of the grid hierarchy, by simply subsampling every 2nd grid point of every 2nd slice. That 
is, suppose we have a finite differencing scheme with (^{A") LTE, whose domain of depen- 
dence for computing is the set of K grid points {(y+jB^fc, |l <k<K} for some 
constants {j3i} and {o;^}. For any {j, i) and 4, let (^j"^' be the value of ^jj obtained from the 
usual numerical integration with grid spacing A. Let (^j^^^^' be the value of 0^./ obtained 
by taking a single step of size 2 A using as inputs the values at the set of K grid points 
{(7-|-2j3i, i-\-2ai,)^ \ <k< K}. Then I estimate the LTE in computing as 

where the normalization factor is obtained by comparing the LTE of a single 24 -sized step 
with that accumulated in 4 separate 4 -sized steps covering the same region of spacetime. 

This scheme is easy to implement and works well, although it does limit the locations 
where the LTE can be estimated to those where (i) the data for a 24 -sized step is available, 
and where (ii) both the A - and 24 -sized steps use the same finite differencing scheme. Con- 
straint (i) implies, for example, that when a new refinement level is created, LTE estimates 



8 



aren't available for it until a few v = constant slices have been integrated on that level. For 
the finite differencing schemes described here, constraint (ii) implies that the LTE estimate 
isn't available for cells within a few grid points of the particle worldline. I haven't found 
either of these constraints to be a problem in practice. 



4.1 Cost of Computing the LTE Estimate 

The cost in space (memory usage) associated with this LTE-estimation scheme is the re- 
quirement that sufficiently many adjacent slices be kept in memory simultaneously for the 
2zi-sized steps. For the globally-2nd-order finite differencing scheme described in section 3, 
the LTE estimator requires data from 3 adjacent slices. As discussed in section 5.3, for 
2nd order global accuracy my AMR algorithm uses interpolation operators which use data 
from up to 4 adjacent slices, so the LTE estimator is "free" in the sense that it doesn't in- 
crease the number of slices needing to be kept in memory beyond what the rest of the com- 
putation already requires. For the globally-4th-order finite differencing scheme described 
in section 3, the LTE estimator requires 7 adjacent slices, while the interpolation operators 
only need 6 adjacent slices, so the LTE estimator adds a | w 17% fractional overhead in 
memory usage. 

I discuss the CPU-time cost of computing the LTE estimate in footnote 9 in section 5.3. 



4.2 Smoothing the LTE Estimates 

In an AMR scheme, mesh-refinement boundaries and regridding operations tend to intro- 
duce small amounts of interpolation noise into the numerical solution, which tends to be 
amplified in the LTE estimate. In particular, for the scheme described here, there are often 
isolated points with anomolously high LTE estimates. To avoid having these falsely trig- 
ger (unwanted) mesh refinements, I smooth the LTE estimates on each slice with a moving 
median-of-3 filter before comparing them to the error threshold. 

Pretorius and Lehner (2004) describe the use of a moving-average smoothing of the LTE 
estimate to address a slightly different problem in their characteristic AMR algorithm: 

The point-wise TE [truncation error] computed using solutions to wave-like finite- 
difference equations is in general oscillatory in nature, and will tend to go to zero 
at certain points within the computational domain . . . , even in regions of relatively 
high tnmcation error. We do not want such isolated points of (anomalously) small 
TE to cause temporary unrefinement, . . . 

For the model problem I consider here, "temporary unrefinement" doesn't seem to be 
a problem in practice so long as the norm || • || is chosen to be the complex magnitude of 
This appears to be because while the complex phase of (p oscillates rapidly along the 
particle worldline, (j) 's complex magnitude tends to remain relatively constant. (In an early 
version of my AMR code where I used the Li norm ||(^||i = |Re[0]| -|- |lm[0]|, I found that 
temporary unrefinement did indeed tend to occur, as the rapidly changing complex phase of 
^ translated into corresponding changes in ||(^ || i •) 

For other physical systems, further smoothing of the estimated LTE might be necessary. 
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5 Adaptive Mesh Refinement 

5.1 The Berger-Oliger Algorithm 

The Berger-Oliger AMR algorithm for Cauchy evolutions of hyperbolic or hyperbolic-like 
PDEs (Berger and Oliger (1984); see also Berger (1982, 1986); Berger and Colella (1989)) 
was first used in numerical relativity by Choptuik (1986, 1989, 1992, 1993), and is now 
widely used for a variety of problems. Schnetter et al. (2004) give a nice summary of 
some of the considerations involved in using the Berger-Oliger algorithm with evolution 
systems which contain 2nd spatial derivatives but only 1st time derivatives. Lehner et al. 
(2006); Briigmann et al. (2008); Husa et al. (2008) discuss adjustments to the algorithm 
(particularly interpolation and prolongation operators) necessary to obtain higher-than-2nd- 
order global finite differencing accuracy in a Berger-Oliger scheme. Pretorius and Choptuik 
(2006) discuss refinements to the standard Berger-Oliger algorithm to accomodate coupled 
elliptic-hyperbolic systems of PDEs. MacNeice etal. (2000); Burgarelli et al. (2006) discuss 
quadtree/octtree grid structures and their use with Berger-Oliger mesh refinement. Many in- 
dividual codes also have published descriptions, including (among many others), AD (Chop- 
tuik (1989, 1992, 1994)), AMRD/PAMR (Pretorius (2002b,a,c)), BAM (Briigmann (1996); 
Briigmann efaZ. (2004, 2008)), Carpet/Cactus (Schnetter aZ. (2004); Schnetter (2001); 
Goodale et al. (2003, 1999)), Chombo/AmrLib/BoxLib (Colella et al. (2009b); Su et al. 
(2006); Colella et al. (2009a); Rendleman et al. (2000); Lijewski et al. (2006)), the Chop- 
tuik et al. axisymmetric code (Choptuik et al. (2003a,b); Pretorius and Choptuik (2006)), 
HAD (Liebling (2002, 2004); Anderson et al. (2006)), GrACE/HDDA/AMROC/DAGH 
(Parashar and Li (2009); Parashar and Browne (2000); Deiterding (2006, 2005b,a); Mitra 
et al. (1995)), Overture (Brown et al. (1999a,b); Henshaw et al. (2002)), PARAMESH 
(MacNeice et al. (2000); Olson and MacNeice (2005); Olson (2006); Olson and MacNeice 
(1999)), and SAMRAl (Hornung and Kohn (2002a, 1999); Hornung etal. (2006); Hornung 
and Kohn (20()2b)). 

Although the focus of this paper is on characteristic Berger-Oliger AMR, it's useful to 
begin with a brief review of the Cauchy Berger-Oliger algorithm. I will only present a few 
of the algorithm's properties that are particularly relevant here; see the references cited in 
the previous paragraph for more extensive discussions of the algorithm, its rationale (i.e., 
why the algorithm is constructed in the way that it is), and how it may be modified to meet 
various situations. 

To describe the Berger-Oliger algorithm it's convenient to consider a generic PDE with 
propagating- wave solutions in 1-|-1 (space-|-time) dimensions, and define global timelike 
and spacelike coordinates t and x respectively.^ 1 consider uniform finite difference grids 
in these coordinates, with indices j and ; indexing the t and x dimensions respectively. In 
contrast to the characteristic-grid case discussed in section 3, I don't assume that the grid 
cells are square, i.e., I don't assume anything about the Courant number (the ratio of the 
time step to the spatial resolution). 

The basic data structure of the Berger-Oliger algorithm is that of a hierarchy of such 
uniform grids, each having a different resolution. The grids are indexed by an integer "re- 
finement level" I in the range < ^ < £max- (In general ^max is time-dependent, but for con- 
venience 1 don't explicitly show this in the notation.) 1 refer to the grid at refinement level £ 
as G^^\ to the time level ("slice") j (i.e., the slice t = tj) of G^^' as G^.^'', to the grid point 
{t=tj,x=Xi) as and to the grid function value(s) at this grid point as G^*^;. In general 

' For the model problem of section 2 these coordinates may be taken to be ( = (schw and x = 
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any ' may consist of several connected components, but here I focus on the simpler case 
where each G^^^ contains only a single connected component'' covering some closed interval 
X 6 [G^P -Xmin , G^j^ -Xmsa] , with Corresponding grid-point indices ; 6 [G . imin • • imax] • As 
suggested by the notation, in general the domain of G'^' isn't rectangular, i.e., in general 
.JCmin, .Xraia, ofKirain, and G^^^imax all vary with j. 

The coarsest or "base" grid G'"^ covers the entire problem domain. For each integer £ 
with 0<£< imax, G'^' has a resolution 2^ times finer than that of G'^''^^ and typically cov- 
ers only some proper subset of the problem domain. Here I consider only "vertex-centered" 
grids, where every 2nd G^^^'^ point coincides with a G*^^^ point.'' The Berger-Oliger algo- 
rithm requires that the grids always be maintained such that for any £, on any slice common 
to both gW and G^^+^\ the region of the problem domain covered by G(^+') is a (usually 
proper) subset of that covered by G^^). I refer to this property as the "proper nesting" of the 
grids.^ 

Each grid G^^^ maintains its own current slice for the integration, denoted G current y 
the j coordinate of this slice is denoted G^'^^ current _j. Each slice is integrated with the 
same finite differencing scheme and Courant number.^ Although I describe each G*^^' here as 
containing the entire time history of its integration, in practice only the most recent few time 
slices need to be stored in memory. The precise number of time slices needed is set by the 
larger of the number needed by the unigrid finite differencing scheme, the LTE estimation, 
and by the interpolations used in the Berger-Oliger algorithm. This is discussed further in 
section 5.3. 

Figure 3 (ignoring the lines marked with •, whose purpose will be discussed in sec- 
tion 5.3) gives a pseudocode outline of the Berger-Oliger algorithm. Notice that the algo- 



For the model problem of section 2 this isn't a significant restriction, since apart from the (ignorable) 
spurious radiation discussed in section 5.3.1, the resolution required to adequately represent <j) tends to de- 
crease monotonically with distance from the particle worldline, so that for a given LTE threshold, the region 
of a shce needing a given resolution is just a single closed interval. For other problems this might not be the 

case, requiring some or all of the to have multiple connected components for good efficiency. This would 
somewhat complicate the data structures, but it wouldn't alter the overall character of the AMR algorithm. 

^ This can easily be generalized to any other integer refinement ratio > 1 (including having the refinement 
ratio vary from one level to another). Larger refinement ratios reduce the f „,ax needed for a given total dynamic 
range of resolution in the refinement hierarchy and thus reduce some of the "bookkeeping" overheads in the 
computation. However, smaller refinement ratios give a smoother variation of the grid resolution (i.e., a 
variation with smaller jumps) across the problem domain, allowing the resolution to be better matched to that 
needed to just obtain the desired LTE at each event, which improves the overall efficiency of the computation. 
For this latter reason 1 use a refinement ratio of 2 : 1 in my AMR algorithm and code. 

* An alternative approach uses "cell-centered" grids, where grid points are viewed as being at the center 
of grid cells and it is these grid cells which are refined (so that the G<^+'' grid points are located | and 
I of the way between adjacent grid points). This approach is particularly useful with finite volume 
discretizations (LeVeque (2002)) and is used by, for example, the PARAMESH mesh-refinement framework 
(MacNeice et al. (2000); Olson and MacNeice (2005); Olson (2006); Olson and MacNeice (1999)) and the 
BAM code (Brugmann et al. (2004, 2008)). 

' The proper-nesting requirement is actually slightly stronger: each G('+') grid point must be far enough 
inside the region covered by G'*' to allow interpolating data from G'*'. In practice, in the v direction this 
requirement is enforced impUcitiy by the Berger-OHger algorithm, while in the u direction this requirement 
must be enforced expUcitiy in the regridding process (procedure shrink_to_ensure_proper_nesting() 
in figure 4). 

* Using the same Courant number at each refinement level (i.e., scaling the time step on each grid propor- 
tional to the spatial resolution) is known as "subcyling in time" and is widely, though not universally, used 
in Berger Oliger AMR codes. Dursi and Zingale (2003) disuss some of the tradeoffs determining whether or 
not subcycling is worthwhile. 
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rithm is recursive, and tliat tliis recursion is at the granularity of an entire slice. That is, the 
algorithm integrates an entire slice (lines 1 1-19 in figure 3) before recursing to integrate the 
finer grids (if any). 

A key part of the Berger-Oliger algorithm is regridding (lines 23-28 in figure 3), where 
the grid hierarchy is updated so that each g'^^ covers the desired spatial region for the cur- 
rent time. As shown in more detail in figure 4, if this requires adding a new G^^^ to the 
hierarchy, or moving an existing G^'^^ to cover a different set of spatial positions than it pre- 
viously covered, then data must be interpolated from coarser refinement levels to initialize 
the new fine-grid points. Finer grids may also need to be updated to maintain proper nesting. 
Because of this, and because of the data copying discussed below, regridding is moderately 
expensive, typically costing times as much (at each level of the refinement hierarchy) 
as integrating a single time step. To prevent this cost from dominating the overall computa- 
tion, regridding is only done on "selected" slices; in practice a common choice is to regrid 
on every kth slice at each level of the refinement hierarchy, for some (constant) parameter 
^ ~ 4. For similar reasons, the LTE estimate is often only computed at every kth grid point.^ 

Figure 5 shows an example of the operation of the slice-recursion algorithm discussed 
in section 5.3. However, parts (a)-(e) of this figure can also be interpreted as an example 
of the operation of the (Cauchy) Berger-Oliger algorithm discussed in this section, with u 
as the spatial coordinate and v as the time coordinate. [The example shown is unrealistic in 
one way: to allow the figure to show a relatively small number of grid points (and thus be 
at a larger and more legible scale), the figure ignores the limits on regridding discussed in 
section 5.3.1, which my code actually enforces.] 

Because the globally-^th-order finite differencing scheme illustrated in figure 5 is a 4- 
level scheme, the extended initial data for the base grid comprises 3 slices and 3 points on 
each succeeding slice. Figure 5a shows the base grid just after the computation of the first 
evolved point of its first evolved slice (i.e., the first slice which isn't entirely part of the 
extended initial data). 

Figure 5b shows an example of the LTE being checked at several points, and (after 
smoothing) exceeding the error threshold at one of these. The regridding procedure thus 
creates a new fine grid (lines 18-22 in figure 4). Figure 5c shows this new fine grid just after 
the computation of the first evolved point of its first evolved slice. Notice that the actual 
spatial extent of the newly-created fine grid is larger than just the set of points where the 
(smoothed) LTE exceeds the error threshold, for two reasons: 

- If the (smoothed) LTE estimate exceeds the error threshold at some location on the 
current slice, then logically we don't know which point(s) in the LTE-estimate molecule 
have inaccurate data. The algorithm thus includes all the LTE-estimate molecule's points 
in the region-to-be-refined (line 18 of figure 3). 

- The algorithm also uses "buffer zones" to further enlarge the region-to-be-refined in the 
spatial direction beyond the set of points just described (lines 20-21 in figure 3). The 
buffer zones are 2 grid points on each side of this set for the example shown in figure 5c. 
The buffer zones are used for two reasons: 

- The buffer zones ensure that moving solution features and their finite-difference 
domains of dependence will remain within the refined region - and thus be well- 
resolved - throughout the time interval before the next regridding operation. 



' The LTE estimate discussed in section 4 rougtily doubles thie cost of integrating a single grid point, so 
estimating the LTE at every kth grid point of every kth slice adds a fractional overhead of roughly 1/^ to the 
computation. For k = 4 (the example shown in figure 5b) this is only about 6%. 
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- The buffer zones also help to ensure that if there are any finer grids in the grid 
hierarchy, the finite-difference domains of dependence for interpolating the next- 
finer grid's spatial bovmdary data from the current grid (line 9 in figure 3; figure 5c,d) 
will avoid regions where the solution is not well-resolved by the current grid (and 
hence the interpolation would be inaccurate). 

When a new fine grid G'^'+'^ is created, at what time level should it be placed relative 
to the next-coarser grid G^^^? There are a number of possible design choices here, rang- 
ing from the time level of the most recent all-points-below-the-error-threshold G^^^ LTE- 
estimate check up to the time level of the some-points-above-the-error-threshold G'^^ LTE- 
estimate check which triggered the creation of the new fine grid. [Placing the fine grid at an 
earlier time level makes the total integration slightly more expensive, but lessens the use of 
insufficiently-accurate coarse-grid data (i.e., G^^^ data whose LTE estimate exceeds the error 
threshold) in interpolating the fine grid's initial data.] As shown in the example of figure 5c, 
I have (somewhat arbitrarily) chosen to place the newly-created fine grid G'^^^^^ with the last 
(most-future) of its initial-data slices (those which are entirely interpolated from the next- 
coarser grid G^'^' ; line 20 of figure 4) at the time level G^^^^^^^^ j_i , one coarse-grid time step 
before the time level on which the over-threshold LTE estimates were computed. 

Each slice G^^^ is a standard 1-dimensional grid function or set of grid functions, and 
so may be stored as a contiguous array or set of arrays in memory. This is easy to program, 
and allows the basic time integration (lines 11-19 in figure 3) to be highly efficient.'*^ '' It 
also means that the amount of additional "bookkeeping" information required to organize 
the computation is very small, proportional only to the maximum number of distinct grids 
at any time. The one significant disadvantage of contiguous storage is that when regridding 
requires expanding G j then the existing data must be copied to a new (larger) set of arrays. 
However, the cost of doing this is still relatively small (less than the cost of a single time 
step for all the grid points involved). 

After the recursive calls to integrate finer grids (lines 33 and 34 in figure 3), g('^+'' 
has been integrated to the same time level as G^^'. Figure 5d shows an example of this. 
Since G(^+'' has twice as fine a resolution as G^^\ it presumably represents the solution 
more accurately at those events common to both grids. To prevent the coarser grids from 
gradually becoming more and more inaccurate as the integration proceeds (which would 
eventually contaminate the finer grids via coarse-to-fine interpolations in regridding), the 
algorithm copies ("injects") g('^+'' back to G^^' at those events common to both grids. This 
is done at lines 36^1 in figure 3; figure 5e shows an example of this. 

5.2 The Pretorius-Lehner Algorithm 

Pretorius and Lehner (2004) discuss modifications to the standard Berger-Oliger algorithm 
to accomodate characteristic evolution. Their algorithm treats the two null directions sym- 

Because the algorithm primarily sweeps sequentially through contiguously-stored grid functions, it 
should have fairly low cache miss rates. Moreover, many modem computer systems have special (compiler 
and/or hardware) support for automatically prefetching soon-to-be-used memory locations in code of this 
type, further reducing the average memory-access time and thus increasing performance. 

" In fact, with appropriate software design the basic integration routine can often be reused intact, or 
almost intact, from an existing unigrid code. For example, this is common when using the CARPET (Schnetter 
et ill. (2004); Schnetter (2001)), PARAMESH (MacNeice et al. (2000); Olson and MacNeice (2005); Olson 
(2006); Olson and MacNeice (1999)), and SAMRAl (Hornung and Kohn (2002a); Homung et al. (2006); 
Hornung and Kohn (2002b)) mesh-refinement infrastructures. 
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1 # integrate G^'^ forward by one time step 

2 procedure tlme_step (integer £) 

3 { 

. current _j <— G(''.current_j + 1 

5 integer j <— GC^.currentJ 

6 if {e = 0) 

7 then set Gj*' spatial boundary data from the physical boundary conditions 

8 and (if used) the extended initial data 

9 else set Gj*' spatial boundary data by spacetime-interpolating from the next-coarser grid G''"^' 
10 

11 # main integration (inchides local-truncation-error estimation at selected points of selected slices) 

12 region where.tojref ine ^ 

13 for integer li from Gj^'.lmin to Gj'^imax 

14 { 

15 Qf}^ ^ iipdate using the unigrid finite differencing scheme 

16 if ((j is a regridding time level) and (ii is an LTE-estimation point)) 

17 then if (smooth(LTE_estimatej^ii) > error.threshold) 

18 then where.tojref ine <— uhere_tojref ine U {i in LTE.estimatej n molecule} 

19 } 

20 if (Mhere_tojref ine 0) 

21 then where.tojref ine <— where.tojref ine U buffer zones 
22 

23 # regrid if necessary 

24 if (j is a regridding time level) 

25 then ceill regrid(£-t-l, where-tojref ine) # add, delete, and/or move 

26 # so it covers the region where_to jref ine 

27 # (may also move or delete finer grids G^^' for 

28 # fc > £+1 in order to maintain proper nesting) 
29 

30 # recurse if necessary 

31 if (G(^+i' exists) 

32 then { 

33 call time_step (^+1) 

34 call time_step(^+l) 
35 

36 # copy ("inject") fine-grid results back to coarse grid 

37 integer jl <- G('+''.current_j 

38 V integer i such that Gj'j coincides with some grid point Gj'^i' of the next-finer grid 

39 { ' 

40 (^^(^ 

41 } 
42 

43 • # re-integrate remainder of coarse slice 

44 • integer i_inax_of _f ine_grid ^ the Gj^' i coordinate which coincides with Gj^^^'^^imax 

45 • for integer i from i_max_of _f ine_grid+l to G^^'.imax 

46 • { 

47 • Gfl ^ update using the unigrid finite differencing scheme 

48 • } ' 

49 } 

50 } 



Fig. 3 This figure gives an outline of the standard Berger-Oliger AMR algorithm (if the lines marlced with • 
are omitted), and of the slice-recursion algorithm (if the lines marked with • are included). See figure 4 for 
an outline of the procedure regridO which is called at line 25. 
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1 # add, delete, and/or move G''^ so it covers the region R 

2 # (may also move or delete finer grids G'*') (ot k > ( in order to maintain proper nesting) 

3 procedure regrld (integer £, region R) 

4 { 

5 if (the next coarser grid G'* does not have enough time levels stored 

6 to allow spacetime-interpolation from G^*"'^') 

7 then return # ignore the regridding request for now; if the need for regridding persists, 

8 # the Berger-Oliger algorithm will keep requesting regridding, until eventually 

9 # G(^~^) will have enough time levels stored for interpolation to be possible 
10 

11 if (R = 0) 

12 then { 

13 destroy any G'*' with k>e 

14 •^max * ^ 1 

15 } 
16 

17 else if (G''' does not exist) 

18 then { 

19 create G'') covering the region R 

20 initialize G^^^ by spacetime- interpolating data from the next coarser grid G^^^^^ 

21 ■^max * ^ 

22 } 
23 

24 else { 

25 move so it covers the region R, initializing any new points 

26 by spacetime-interpolating data from the next coarser grid G^'^^' 

27 call shrlnk_to_ensure_properjiestlng(f+l) 

28 } 

29 } 
30 

31 # shrink G^'^^ and any finer grids as necessary so as to ensure proper nesting 

32 procedure shrink_to_ensure_proper nesting (integer £min) 

33 { 

34 max jradlus <— the largest radius of any interpolation molecule in the i direction 

35 for integer £ from fmin to €max 

36 { 

37 integer j ^ G'^^^^^.currentJ 

38 region R_slirunken ^ [Gj^ ^'.i_min+maxjradius, G^^ ^'.i_raax— maxjradius] 

39 if (G'*' g R_shrunken) 

40 then { 

41 region R_intersection ^ R_shruiikenn region covered by G^^^ 

42 if (R.intersection = 0) 

43 then { 

44 destroy any G*'"' with k > £ 

45 ^max ' k — \ 

46 return # there are now no finer levels, so this procedure is done 

47 } 

48 shrink G^^) to just cover the region R.intersection 

49 } 

50 } 

51 } 



Fig. 4 This figure gives an outline of the regridding procedure regr id ( ) wliich is called by the main Berger- 
OUger algorithm (figure 3), and of the auxihary procedure shrink_to_ensure_proper_nesting() wliich 
is called by regridO . 
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(a): coarse-grid extended initial data; 

spatial-boundary data (figure 3, lines 7-8) ; 
1st evolved point (figure 3, fines 11-15) 

• fine-grid initial and spatial-boundary data 
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(b): integrate coarse grid, check LTE 
(figure 3, lines 11-19) 
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((■); fine-grid initial and spatial-boundary data 
iutcipolatcd from coarse grid (figure 4, 

^ line 20 and figure 3, line 9); 
o fine-grid 1st evolved point 
(figure 3, lines 11-15) 
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(d) : integrate fine grid up to coarse time level 
(figure 3, lines 33 and 34) 
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(e): copy fine-grid results back to coarse grid 
(figure 3, lines 36-41) 
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(f): reintegrate coarse-grid "tail" 
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(figure 3. lines 43-48) 
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Fig. 5 This figure shows an example of the operation of the slice-recursion algorithm, using the globally- 
4th-order finite differencing scheme described in section 3. Parts (a)— (e) can also be interpreted as an example 
of the operation of the (Cauchy) Berger-Oliger algorithm, with u as the spatial coordinate and v as the time 
coordinate. See the main text for further discussion. 
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metrically, and instead of using a separate regridding step, interleaves the integration, injec- 
tion, LTE estimation, and updating of the mesh-refinement hierarchy at a very fine granular- 
ity (essentially that of individual diamond cells). This gives an elegant algorithm where the 
integration can proceed simultaneously in both null directions, "flowing" across the problem 
domain in a way that's generally not known in advance. 

Because of this tmpredictable flow, Pretorius and Lehner don't use contiguous arrays 
to store the grid functions. Rather, they use a fine-grained linked-list data structure, where 
each grid point at each refinement level £ stores explicit pointers to its 4 (null) neighboring 
points at that refinement level, and also to the grid points at that same event at refinement 
levels £±l. The Pretorius-Lehner algorithm explicitly walks these pointer chains to locate 
neighboring points for the (unigrid) integration at each refinement level, to create finer grids, 
and to inject fine-grid results back into coarser grids at each level of the refinement hierarchy. 

In comparison to the contiguous storage possible with the standard (Cauchy) Berger- 
Oliger algorithm, this linked-list storage allocation avoids data copying when grids must be 
grown. However, the per-grid-point pointers require extra storage, and following the pointer 
chains adds some programming complexity and extra execution time.'^ '^ 



5.3 The Slice-Recursion Algorithm 

Here I describe a different variant of the Cauchy Berger-Oliger algorithm for characteristic 
evolution. The basic concept of this algorithm is to treat one null direction (v) as a "time" 
and the other (u) as a "space", then apply the standard Berger-Oliger algorithm as discussed 
in section 5.1 (with one significant modification discussed below). Figure 3 (now including 
the lines marked with •) gives a pseudocode outline of this "slice-recursion" algorithm, and 
figure 5 shows an example of the algorithm's operation for the globally— 4th-order finite 
differencing scheme described in section 3. 

For a Cauchy evolution, the future light cone of a grid point contains only grid 
points on the next t = constant slice. In contrast, for a characteristic evolution, the future 
light cone of a grid point contains all points {j, i) with j > 7* and / > j*. To see the 

impact of this difference on the Berger-OUger algorithm, suppose that on some v = constant 
slice we have a coarser grid G^^' overlaid by a finer grid G'^^+'^ covering the coarse-grid 
coordinate region / e 12] . Then the injection of the fine-grid results back to the coarse grid 
(lines 36-41 in figure 3; figure 5e) restores the coarse-grid solution to the fine-grid accuracy 
for i e [i'l , (2] . However, unlike in the Cauchy case, the coarse-grid solution for the slice 
"tail" / > 12 remains inaccurate, because its computation was affected by the (inaccurate) 
pre-injection coarse-grid region i G [j'l , J2] . The solution to this problem is to re-integrate the 
"tail" i > i2 of the slice after the injection (lines 43^8 of figure 3; figure 5f). 

Depending on the placement of the fine grid relative to the coarse grid, the cost of the 
re-integration may vary from negligible up to roughly the cost of a single time step for the 
coarse grid (i.e., a factor of 2 increase in the cost of the coarse-grid part of the computation). 
This overhead only affects grids with 0<£< ijo^; the finest grid (£ = ^max) never needs to 

The largest execution-time cost is probably that due to (nearby) grid points which are accessed in suc- 
cession not being in contiguous, or even nearby, memory locations. This leads to increased cache miss rates 
and thus poorer performance. 

The execution time of the dynamic storage allocation routines themselves (C's mallocO and freeO, 
or other languages' equivalents) may also be substantial. However, this can be greatly reduced by using 
customized storage-allocation routines that allocate grid points in large batches. These and other optimization 
techniques for dynamic storage allocation are discussed by, for example, Wilson et al. (1995); Lea (2000). 



17 



be re-integrated. In practice tiie re-integration overbad is generally modest; I present numer- 
ical test results quantifying this in section 6. 

Tbe slice-recursion algorithm I present here is quite similar to that outlined by Hamade 
and Stewart (1996). Their algorithm shares the basic Berger-Oliger mesh-hierarchy struc- 
ture, uses the same LTE estimator (section 4), imposes the same nesting requirements on 
the grid hierarchy, and does the same "tail" re-integration (lines 43^8 of figure 3; fig- 
ure 5f). However, their algorithm uses a 4 : 1 refinement ratio between adjacent levels in the 
mesh-refinement hierarchy, whereas I use a 2 : 1 ratio in the slice-recursion algorithm for the 
reasons outlined in footnote 5. They discuss only globally-2nd-order finite differencing. 

When the grid hierarchy contains 3 or more refinement levels, tbe evolution and regrid- 
ding schemes described by Hamade and Stewart (1996) are somewhat different than those 
presented here: Their algorithm integrates child grids at all levels of the grid hierarchy up 
to the same time level before doing any fine-to-coarse-grid injections (lines 36^1 of fig- 
ure 3; figure 5e) or regridding (figure 4), whereas tbe algorithms presented here follow tbe 
standard Berger-Oliger pattern where injections and regridding are interleaved with the evo- 
lution of different refinement levels in an order corresponding to the depth-first traversal of 
a complete binary tree. 

Unlike tbe algorithm of Hamade and Stewart (1996), the slice-recursion algorithm pre- 
sented here is purely recursive, treating all levels of tbe refinement hierarchy in exactly tbe 
same way except for the setup of the base grid's extended initial data and the details of how 
tbe spatial boundary data are determined at the start of each slice's integration (lines 6-9 
of figure 3). An important consequence of tbe algorithm being organized this way is that 
for any integer k > \, the algorithm's computations on a grid hierarchy with k refinement 
levels are identical (apart from tbe initial-data setup just noted) to those in each of tbe re- 
cursive calls (lines 33 and 34 of figure 3) for a problem with k+l refinement levels. More 
generally, the treatment of the k finest refinement levels in the grid hierarchy is independent 
of the presence of any coarser level(s). I find that this simplifies debugging, by making the 
algorithm's behavior on small test problems with only a few refinement levels very similar 
to its behavior on large "physics" problems with many refinement levels. 

Like any Berger-Oliger algorithm, the slice-recursion algorithm needs to interpolate data 
from coarse to fine grids (line 9 of figure 3 and lines 20 and 25-26 of figure 4, figure 5c). 1 use 
a mixture of 1 -dimensional and 2-dimensional Lagrange polynomial interpolation for this, 
in all cases chosen so as to avoid crossing the particle worldline. Appendix B.3 describes 
the interpolation operators in detail. These operators may use data from up to 4 [6] adjacent 
slices for the 2nd [4th] order GTE schemes, which sets a lower bound on tbe number of 
slices of each G^^^ which must be kept in memory. For 2nd order GTE, my code keeps only 
the minimum (4) number of slices in memory; for 4th order GTE, it keeps one extra slice in 
memory (for a total of 7) to accomodate the LTE estimator (section 4; figure 5b,c). 

5.3.1 Avoiding Undesired Mesh Refinement 

When adding a new refinement level to the mesh-refinement hierarchy, the interpolation of 
initial data (Une 20 of figure 4) tends to introduce low-level noise into the field variables. Tbe 
same is true when an existing fine grid is moved to a new position. In either case, this noise 
can cause the LTE estimate to be inaccurate. It's thus useful to allow this noise to decay 
(i.e., be damped out by tbe inherent dissipation in tbe finite differencing scheme) before 
using tbe LTE estimate to determine the placement of another new refinement level. To this 
end, my code suppresses regridding operations for the first 8 [16] time steps of a new or 
newly-moved grid, for the 2nd [4th] order GTE finite differencing scheme respectively. 
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My code also suppresses creating a new fine grid if insufficient data is available for the 
interpolation of all 4 [7] slices kept in memory for the 2nd [4th] order GTE finite differencing 
scheme respectively. 

When using the slice-recursion algorithm for the self-force computation, the arbitrary 
initial data on the southwest and southeast grid faces induces spurious radiation near these 
grid faces. This radiation is of no physical interest, so there's no need for the mesh-refinement 
algorithm to resolve it. Moreover, as discussed in footnote 4, not resolving the spurious ra- 
diation also allows a significant simplification of the code's data structures. Thus my code 
suppresses mesh refinement for the first ~ lOOM of the integration, and for the first ~ lOOM 
of each slice thereafter. 

To reduce the effects of the interpolation noise when adding new refinement levels, my 
code also turns on the mesh refinement gradually, adding new refinement levels only at the 
rate of one each lOM of evolution. More precisely, the code limits the maximum refinement 
level to 

( ifrvu< lOOM 



rvu-lOOM 
WM 



ifrvu> lOOM ' 



where rvu = min(rv, ru) is the distance from the closest point on the southeast or southwest 
grid faces. 



6 Numerical Tests of the AMR Algorithm 

As a test case for the slice-recursion AMR algorithm, 1 consider a particular example of 
the model problem of section 2, which arises in the course of calculating the radiation- 
reaction "self-force" on a scalar particle orbiting a Schwarzschild black hole (Barack and 
Ori (2002), see Barack (2009) for a general review). 1 take to be a complex scalar field, 
with the potential V{r) and source term 5(fschw) given by 

Simitschw) = / exp(-/wa)(rp)fschw) , (9) 



where 



(0{r) = J^ (10) 



and 



E{r) = f{r)[\-—j . (11) 

are respectively the orbital frequency and energy per unit mass of a particle in circular orbit 
at areal radius r in Schwarzschild spacetime. The coefficients a^m are defined such that the 
spherical harmonic Y(„{d=j,q)) = a(me'""^, i.e.. 



' + m-l)\\{e-m-iy.\ . 

ir i—m IS even 



if £—m is odd 
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where the "double factorial" function is defined by 




(13) 



For this test, I take £ = m = 10, rp = lOM, and use a problem domain size of D = 200M 
on a side. Here Vmin = 12.773M, = — 12.773M, and the grid extends from rv = OM to 
rv = 200M and from ru = OM to ni = 200M. 

The physical boundary data is zero along the southwest and southeast grid faces, and 
the extended initial data is computed using the 2-level subsampling scheme described in ap- 
pendix A. 2.4. The base grid has a resolution of 0.25M, the finite differencing is the globally- 
4th-order scheme, and the error tolerance for the LTE estimate is 10^"". As discussed in 
section 5.3.1, mesh refinement is suppressed for the first lOOM of the evolution and the first 
lOOM of each slice thereafter, and then turned on gradually at the rate of one refinement 
level for each further lOM of evolution. 

Figure 6 shows a "map" of the mesh refinement, giving the highest refinement level at 
each event in the problem domain. Notice that the highest-refinement grids only cover small 
regions close to the particle worldline. 

The online supplemental materials which accompany this paper include a movie (online 
resource 1) showing the spacetime-dependence of (j) and the placement of the refined grids; 
figure 7 shows several sample frames and explains them in more detail. 

Because of the adaptive placement of refined grids, it's difficult to do a standard conver- 
gence test (Choptuik (1991)). Instead, I have followed Choptuik (1992) in adding an option 
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Fig. 6 This figure shows a "map" of the mesh refinement, giving the highest refinement level at each event in 
the problem domain. The vertical dashed line labelled "particle" shows the particle worldline. The diagonal 
dashed line labelled "slice" shows the rv = 150M slice; figure 8 shows convergence tests on that slice. 
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Fig. 7 This figure shows sample frames from the movie accompanying this paper (online resource 1). Each 
frame of the movie shows a single rv = constant slice, and plots Re[(|)], Im[(^], and ||0|| on the left scale, 
and the spatial extent of each refinement level (on the right scale) as the horizontal lines. The u, ru, and 
rv coordinates are all shown in units of the Schwarzschild spacetime mass M. Each frame of the movie also 
includes a subplot (in the lower right comer) showing the location of that plot's rv = constant slice within 
the entire problem domain. The vertical line in the subplot shows the particle worldline. 
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to my code to write out a "script" of the position and grid spacing of each grid generated 
by the AMR algorithm, and a related option to "play back" such a script. For a convergence 
test, I first run a "record" evolution, then generate a "playbackxA^" script by refining each 
grid in the script by a chosen (small integer) factor N, and finally "play back" the refined 
script. 

Figure 8 shows an example of such a convergence test for the field (j) on the rv = 150M 
slice. The code shows excellent 4th order convergence in the interior of each grid, across 
mesh-refinement boundaries, and near to the particle. 

Profiling the code shows that almost all of the CPU time is consumed in the basic 
diamond-cell integration code, and the code's overall nmning time is closely proportional to 
the number of diamond cells integrated. In other words, the AMR bookkeeping consumes 
only a negligible fraction of the CPU time. 

For this evolution, the AMR evolution integrates a total of 35.6 x 10* diamond cells. 
Of these, 16% are accounted for by the re-integration of coarse slices after fine-grid recur- 
sion (lines 43^8 of figure 3). A hypothetical unigrid evolution covering the entire prob- 
lem domain at the resolution of the finest AMR grid (level 5) would require integrating 
655 X 10* diamond cells, 18.4 times as many as the AMR evolution. That is, for this prob- 
lem the AMR evolution was approximately a factor of 18 faster than an equivalent-resolution 
(and thus equivalent-accuracy) unigrid evolution. 

To further characterize the performance of the slice-recursion algorithm, I consider a 
sample of 295 separate evolutions of the same model problem, with each evolution having a 
different {£, m) in the range < £ < 40 and () <m <l, and a problem-domain size between 
400M and 30000^^.'"* The globally-4th-order scheme is used for all of these evolutions, 
with an error tolerance for the LTE estimate of 10~'*. Figure 9 shows histograms of the 
re-integration overhead (the fraction of all diamond-cells integrations which occur as part of 
coarse-grid re-integrations) and the AMR speedup factor for this sample of evolutions. The 
re-integration overhead is usually between 20% and 25%, and never exceeds 30%. 

For this sample the AMR speedup factor varies over a much wider range, from as low 
as 8.9 to as high as 400. The median speedup factor is 19, and 95% of the speedup factors 
are between 12 and 51. 



7 Conclusions 

The main result of this paper is that only a small modification (the tail re-integration, i.e., the 
lines marked with • in figure 3) is needed to adapt the standard Cauchy Berger-Oliger AMR 
algorithm to characteristic coordinates and grids. The resulting "slice-recursion" algorithm 
uses relatively coarse-grained control for the recursion and the adaptivity part of AMR, 
which greatly simplifies the memory management, allowing entire null slices to be stored 
in contiguous arrays in memory. The algorithm can readily accomodate any order of finite 
differencing scheme; I present schemes for both 2nd and 4th order global truncation error. 

The numerical tests results presented here demonstrate that the slice-recursion algorithm 
is highly efficient and displays excellent 4th order convergence. This algorithm readily acco- 
modates a moving "particle" Dirac 5-function source term (which leads to a solution which 
is generically only C' at the particle position); there is no loss of convergence there. 



This set of evolutions arises naturally as part of a calculation of the radiation-reaction "self-force" on a 
scalar particle orbiting a Schwarzschild black hole. Details of this calculation will be reported elsewhere; for 
present purposes these evolutions provide a useful set of test cases for the AMR algorithm. 
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Fig. 8 Tliis figure shows and its convergence on ttie rv = 150M slice. In each subfigure, the soUd vertical 
line at ru = 150M shows the particle position. The upper subfigure (which shows the entire slice) plots the real 
and imaginary parts of on the right (linear) scale, and the complex norm \\(p\\ on the left (logarithmic) scale. 
It also plots one of the convergence differences ||record — playbackx2[| on the left (logarithmic) scale. The 
horizontal lines show the portion of this slice covered by each refinement level. The lower subfigure (which 
shows an expanded view of a ±8M region centered on the particle position) plots the real and imaginary parts 
of (p on the right (linear) scale, and the four convergence differences (scaled by the 4th power of the reso- 
lution) on the left (logarithmic) scale. Notice that the three higher-resolution scaled-convergence-difference 
curves are almost superimposed, indicating excellent 4th-order convergence, and that this convergence is not 
degraded either across the mesh-refinement boundaries at rv = 143. 3M and rv = 153. IM, or near the particle 
at rv = 150M. 
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The main potential disadvantage of the slice-recursion algorithm over the standard Cauchy 
Berger-Oliger algorithm is the cost of the tail re-integration. This cost depends on the place- 
ment of fine grids relative to their coarser parent grids; the worst possible case is a factor 
of 2 overhead in the coarse-grid part of the integration. There is no overhead for the finest 
level of the mesh-refinement hierarchy. In practice I find the re-integration cost to be quite 
modest, typically about 20-25% of the total computation, and (in my tests) never exceeding 
30%. Compared to an equivalent-resolution imigrid evolution, I find that the slice-recursion 
algorithm is always much faster, with the exact speedup factor varying widely from one 
problem to another: for a sample of 295 test cases, 95% of the speedup factors are between 
12 and 51. 

The largest practical obstacle to the use of Berger-Oliger mesh refinement algorithms, 
including the slice-recursion algorithm presented here, is probably their implementation 
(programming) complexity. This is much less of an obstacle if codes can be shared across 
projects and researchers. To this end, my code implementing the slice-recursion algorithm 
is included in the electronic supplementary materials accompanying this paper (online re- 
source 2), and is freely available to other researchers under the terms of the GNU general 
public hcense. This code uses C++ templates to support both 2nd and 4th order global trun- 
cation error, and both real and complex scalar fields, with no run-time overhead. It would be 
fairly easy to adapt the code to other finite differencing schemes and/or PDEs. 

Pretorius and Lehner (2004, sections 4.2 and 4.3) describe how their characteristic 
Berger-Oliger algorithm can be extended to problems of higher dimensionality than 1-1-1 
(i.e., problems whose domains have multiple spacelike dimensions), and how the algorithm 
may be parallelized. The techniques they describe should all apply equally to the slice- 
recursion algorithm presented here. 
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scribed in this manuscript. 
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Fig. 9 This figure shows histograms of the re-integration overhead (the fraction of all diamond-cell inte- 
grations which occur as part of coarse-grid re-integrations, i.e., those in lines 43^8 of figure 3) and of the 
AMR speedup factor (the ratio of the nuinber of cells that would have been integrated in hypothetical unigrid 
evolution covering the entire problem domain at the resolution of the finest AMR grid, to the number of 
cells actually integrated in the AMR evolution) for a sample of 295 evolutions. The re-integration overhead is 
usually between 20% and 25%, and never exceeds 30%, while the speedup factor varies much more widely. 
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A Details of the Unigrid Finite Differencing Sclieme 

In this appendix, 1 describe the finite differencing schemes in detail for the model problem (3), for both 2nd 

and 4th order GTR. 

In this appendix only, when describing the computation of at a particular grid point (j.i), 1 sometimes 
use an abbreviated notation for grid-point indexing, denoting the grid point [ j — n.i — m) by a subscript nm. 
Such subscripts can be distinguished from the usual grid-point indices by the absence of a comma between 
the two indices. These abbreviated subscripts may be either integral or half-integral, and in this latter context 
(only) I also use the abbreviation h=\. For example, ^12 denotes ^j_i^,_2, while denotes <t>j^i-i/2- 



A.l Second Order Global Accuracy 

To discretize the wave equation (3) to 2nd order global accuracy in the grid spacing A, 1 use a standard 
diamond-cell integration scheme (Gomez and Winicour (1992); Gomez et al. (1992); Gundlach et al. (1994); 
Burko and On (1997); Lousto and Price (1997); Lousto (2005); Winicour (2009)): Consider a double-null 
"diamond" grid cell of side A, with "north", "west", "easf , and "south" vertices (grid points) N, W, E, and 
S respectively, and central point C, as shown in figure 10. 

Consider first the vacuum case, where the particle worldline r = rp doesn't intersect the cell and hence 
the right hand side of the wave equation (3) vanishes everywhere in the cell. Integrating this equation over 
the cell then gives 

+ 0s - .l-E - <Kv) + (4 Vc^^i^ + G{A')\ = 0, (14) 
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or equivalently 



0N = «l>E + <^v-<l>s-4Vc^^i^ + <?(zi'*), (15a) 

where subscripts denote the value of the field at the corresponding grid point. 

If the particle worldline r = r,, intersects the cell, then as discussed in section 3, I assume it does so 
symmetrically. Integrating the right-hand-side source term in (3) over the grid cell then adds an extra term 

f 2 /-'c+^/z 

/ S(tscbw)S(r-rp)dvdu = S(?schw)*Schw (15b) 
J(xn JVp) 

to the right hand side of (15a), where tc = (?schw)c- ^or the test case considered in section 6, this source-term 
integral becomes 



, Sfafechw)^tschw = 1"% exp{-imco{rp)tc)A sipc(^mco(rp)A) + &{A^) . (16) 



A.2 Fourth Order Global Accuracy 

To discretize the wave equation (3) to 4th order global accuracy in the grid spacing 4 , 1 use a scheme based 
on that of Haas (2007) (see also Lousto (2005)), but modified in its treatment of cells near the particle. The 
modification makes the scheme fully explicit, removing the need for an iterative computation at each cell 
intersecting the particle. However, the scheme is only valid with the assumption noted above, that if the 
particle intersects a cell it does so symmetrically. In practice (cf. footnote 2), this means that the scheme is 
only valid for the case where r,, is constant, i.e., the particle is in a circular orbit around the central black 
hole. 

Consider the computation of ^jj, and suppose the particle worldline intersects the v = Vj slice at the grid 
point i = ip. There are several different cases for the finite differencing scheme, depending on the sign and 
magnitude of ( - ip. Figure 10 shows all of these cases. 



A.2.1 \i — ip\>'i (Cell far from the particle) 

If \i — ip\ > 3 (so that the particle worldline doesn't intersect any part of the finite difference molecule), 
then I use the finite differencing scheme of Haas's equations (2.7), (2.10), and (4.10). That is (following 
Lousto (2005) and Haas (2007)) I first define the new grid function G = V^.I then interpolate via Haas's 
equation (2.7), which in my notation reads 

Gkh = ^ [(8Gio + 8Goi) + (-4G20 + 8Gii - 4Go2) + (G30 - G21 - G12 + G03)] + &(A^) . (17) 
1 then compute Gj = G^o + Goh + G^ -I- G^i via Haas's equation (2. 10), which in my notation reads 
G£ = 2{\-\AlVu)Gm 

+ \aIVm)Vk>U +{\- |4|yos)%0oi 

+\(vHo-2VHh + Voh){^io + hi), (18) 

where A2 = A /2. 

Finally, I compute (/iqo via Haas's equation (4.7), which in my notation reads 



)0 = -<l 

+ 


»11 




+1 


[- 





- [(1 - ^AlVhhjA^j (Gz +4Gm), (19) 

where ^3 = 4/3. 
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A. 2. 2 i = ip±l or i = ip±2( Cell near the particle) 



If i — ip = — 1 or —2 (so we are computing at a grid point near to, but to the right of, the particle), then 
the interpolation (17) would use data from both sides of the particle's worldline (hereinafter I refer to this 
as "crossing" the worldline), violating the smoothness assumptions used in the interpolation's derivation. 
Instead, I use the "right" interpolation 

Ghh = 5G10 + jGoi + ^Gii - 5G02 — 5G21 + jGi2 + 0{A^), (20) 

which (since i < ip) doesn't cross the worldline. 

If i — !p = + 1 or +2 (so we are computing at a grid point near to, but to the left of, the particle), then 
again the interpolation (17) would cross the particle's worldline, so I instead use the "left" interpolation 

Ghh = |Goi + 5G10+ 5 Gil - 5G20 — 5G12 + 5G21 + G{A^), (21) 

which (since > ip) doesn't cross the worldline. 

[Each of the interpolations (17), (20), and (21) is actually valid for any grid function which is smooth 
(has a convergent Taylor series in v and u) throughout the region spanned by the interpolation molecule.] 

Once Ghh has been computed, and then Goo can be computed in the same manner as before, i.e., 
via (18) and (19) respectively.'^ 



A.2.3 i = ip ( Cell symmetrically bisected by the particle) 

If (' = ip, then for each k G {1,2,3} I first compute an estimate 0j*' for using the globally-2nd-order 
scheme (15) appUed to the diamond cell of size kA defined by the four grid points N^*) = {j,i), W'*' = 
(;■-*:,(•), eW = {j,i-k), and SW = {j-k,i-k).Iihen assume a Richardson expansion for (j) W (v, m) in the 
effective grid spacing kA (Choptuik (1991)), 

(j)^''^ {v,u) = <j){v,u) + {kA)^ P{v.u) + (kAfQ(v.ii) + ^{A^) , (22) 

where P and Q are smooth functions which do not depend on the grid spacing, and where the ^(M ^ ) leading- 
order error term is that of (15b). Finally, 1 Richardson-extrapolate the value of = ^{v,u) from the three 

estimates (j/jf to obtain 

There are only fi'{N) i = ip cells, so this fi{A^) LTE suffices to keep these cells' collective contribution 
to the global error at ff{A*). 



A.2.4 Initial Data 



As shown in figure 10, the globally-4-th-order finite difference molecules for computing <j)j i generally include 
points on the j—l, and j—3 slices, and at the i—l, 1— 2, and i— 3 u positions. This means that starting the 
integration on any grid requires "extended initial data" on 3 slices and on 3 points on each succeeding slice. 
This poses a problem for setting up the base grid at the very beginning of the integration,'* since physical 
boundary data is only specified along the southwest and southeast faces of the problem domain, i.e., on a 
single initial slice and at a single initial grid point of each succeeding slice. 
There are several possible ways of obtaining the extended initial data: 

Replication of the Physical Boundary Data For the self-force problem (Barack and Ori (2002)) the 
precise choice of boundary data doesn't matter, and it's acceptable to approximate the PDEs to a lower order 
of accuracy near the boundary. Thus for this use, the physical boundary data (in this case 0=0) can simply 
be replicated throughout the extended-initial-data region. 



Notice that once Ghh is known, (18) and (19) use only the grid-function values (pio, 001, and 0ii, and thus 
require only that the particle worldline doesn't pass between these points; this condition is satisfied whenever 

iy^ip. 

There's no problem in starting the integration of other grids, since the extended initial data for any G^*' 
with <? > is interpolated from its parent grid G'*^'' (line 20 of figure 4). 



29 



Taylor-series Approximation For more general purposes, it's usually desirable to approximate the PDE 
to full accuracy everywhere in the problem domain, i.e., to compute the extended initial data to ff{A'^) accu- 
racy. Lousto (2005, section 4.1) describes a Taylor-series approximation scheme to do this. 



Two-Level Subsampling Another scheme for computing extended initial data is to define an auxiliary 
"subsampling" grid with spacing ^ A, which only covers the extended-initial-data region. Choose /Iss 
such that it integrally divides A and such that ziss cx A^ for sufficiently high resolution. The extended ini- 
tial data can now be computed by integrating the auxiliary grid using the globally-2nd-order scheme, then 
subsampling data from the auxiliary grid to the main grid. Because A^s °= 4^ at sufficiently high resolution, 
ff{Ais^) = 6'(A'^), i.e., the global accuracy remains 4th order with respect to the main-grid resolution A. 

The auxUiary grid only needs to cover the first 3 slices of the main grid, together with the first 3 grid 
points of each later main-grid slice. However, this corresponds to &(\/A) auxiliary-grid spacings, so inte- 
grating all the auxiliary-grid points requires 0(1 /A^) CPU time and &{\/A'^) memory. This is much more 
expensive than the main-grid computation, which only requires 0{\/A^) CPU time and ff(l/A) memory. 



4th order: molecule far 
from the particle 

2nd order (Gkh computed via (9)) 



O . / ^ - ^ o 

^ " \ , . / 

r M X ^ 

X K X 

■ X X X X 



4th order: molecule near to, 
and to the left of, the particle 
(Ghh computed via (13)) 



■^'x O I 

^ M * 
X M i 



4th order: molecule near to, 
and to the right of, the particle 
(Ghh computed via (12)) 



I O " 

j M X 
X X 



4tli order: molecule Kynnnotrically 
bisected by the particle 



\ 

\ 



O point (j/', where & is computed *^ ■ 

-|- cell-center point (0 is not used) g 
■ point where is used in the globally-2nd-order scheme (8) 

(either directly or via the Richardson-extrapolation scheme (15)) 
X point where <^ is used in interpolating Ghh via (9), (12), and/or (13) 
• point where ^ is used in computing Gs via (10) and/or (^oo via (11) 



Fig. 10 This figure shows the unigrid finite differencing molecules. For clarity, the 2nd order molecule is 
enlarged relative to the 4th order molecules. In the lower 3 subfigures the vertical dashed lines show possible 
positions for the particle worldline. 
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Recursive Doubling A more efficient approach is to use a recursive-doubling scheme. Here we use a 
sequence of q auxihary grids A'"', A''', A'^', .... A'*^'', with A'*' having spacing 2*"'/!. We choose q 
such that 2^1 A < < 2'^'' A for some A-, « A^ for sufficiently high resolution. The extended initial data 
for the main grid can now be computed as follows (see figure 11 for an example): First integrate A'"' using 
the globally-2nd-order scheme for 4 slices, and for 4 points on each succeeding slice. Then for each k= I, 
2, 3, ... , q— 1 , subsample from A'*" ' ' to obtain the extended initial data to integrate A'*' using the globally- 
4th-order scheme for 2 slices, and for 2 points on each succeeding slice. Finally, subsample from A'*"'' to 
obtain the extended initial data to integrate the main grid using the globally-4th-order scheme. 




• point in auxiliary grid A^"* (spacing 1) computed using globally-2nd-order scheme 
O point in auxiliary grid A^^' (spacing 2) computed using globally-4th-order scheme 

(extended initial data subsampled from auxiliary grid A^''^) 
O point in auxiliary grid A(^) (spacing 4) computed using globally-4th-order scheme 

(extended initial data subsampled from auxiliary grid A^^') 

point in main grid (spacing 8) computed using globally-4th-order scheme 

(extended initial data subsampled from auxiliary grid A<^') 



Fig. 11 This figure shows an example of the recursive-doubling technique for constructing extended initial 
data for the globally-4th-order initial data scheme. The grid axes are plotted in terms of 5 and 
Sv = V - v,ni„, and the grid spacings are in units of finest (4^^') auxiliary grid spacing. In this example 
q = 3 auxiliary grids are used. 
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Each auxiliary grid only has to cover 2 or 4 slices, and 2 or 4 points on each succeeding slice, so this 
scheme is much more efficient than the two-level subsampling scheme: in the high-resolution limit the total 
cost of all the auxiliary-grid integrations is 0'{l/A^) CPU time, the same order as the main-grid computation. 
Unfortunately, even with careful memory management (discarding grid points as soon as they're no longer 
needed) the auxiliary grids still require ^(1/4^) memory, much more than the main-grid computation's 
i^{l/A) memory requirement. 

It's relatively easy to implement the recursive-doubling scheme in a code using the type of fine-grained 
linked-list data structures described by Pretorius and Lehner (2004), where the integration can "flow" in either 
the V direction or the u direction at any stage in the computation. However, for the slice-recursion algorithm 
we typically require that the integration proceed sequentially in the v direction and that any data reuse or 
subsampling take place within the small (typically 4-7) number of slices kept in memory at each refinement 
level. The recursive-doubling initial-data scheme thus requires interleaving the integration of the different 
auxiliary grids along the southeast face of the grid. Figure 12 gives a pseudocode outline of an algorithm to 
generate the appropriate sequence of integrations, subsamplings, and other grid operations for the recursive- 
doubling scheme. 



A.3 The Coarse-Grid Instability 

The finite differencing schemes discussed here become unstable at very low resolutions, in a manner some- 
what resembling the classic Courant-Friedrichs-Lewy instability of Cauchy finite differencing. I have not 
mathematically analyzed this instability," but empirically it only occurs for very low resolutions (large A), 
with the instability threshold (the smallest A for which the instability appears) depending on C, but not on m. 
Figure 13 shows the /: and A for which the instability occurs. Notice that the instability threshold decreases 
gradually with f., and is somewhat smaller for the globally-4th-order scheme than for the globally-2nd-order 
scheme. 

In practice, it's rare for this "coarse-grid instability" to be a significant problem because reasonable 
accuracy requirements normally force much higher resolutions than those where the instabUity would occur. 
The one exception to this is the base grid, which might otherwise be made very coarse (allowing the AMR 
to refine it as needed); the coarse-grid instability prevents this by requiring the base grid to be finer than the 
instability threshold. 



B Implementation Details 

B.l Computing r(r,) 

The finite differencing schemes discussed here use finite-difference grids which are locally uniform in v and u, 
so it's trivial to compute the r, coordinate of any grid point. However, the coefficients in the wave equation (3) 
are all given as explicit functions of r, so the code needs to know the r coordinate of each grid point (and, 
for the globally-4th-order scheme, also of the center of each grid zone). My code computes this as follows: 
Define 



so that r = 2M(1 +e^) and the definition (2) becomes = + . Then y{xt) (and hence r(r,)) can be 
found by using Newton's method to find a zero of the function 




(24b) 



(24a) 



2M' 



h(y) = l+y + e^-X:, 



(25) 



An initial guess for Newton's method can be obtained by neglecting either y or e'' in (25), giving 




(26) 



Gomez and Winicour (1992); Gomez et al (1992) discuss the stability of diamond-cell integration 
schemes in spherical symmetry; Welling (1983); Winicour (2009, section 3.3) discuss subtleties in apply- 
ing the CFL condition to a more general null-cone evolution algorithm in axisymmetry. 
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1 global integer fmax # maximum £ for which we have created A^^) 
2 

3 procedure recurslve_doubllng_initial_cLata (integer q) 

4 { 

5 call create_grid(0) 
6 

7 for integer j from 1 to 

8 { 

9 if (j = 5 ■ 2*"-) 

10 then A(^""'.lniax <— 4 • 2*""" # discard A^*""' grid points which are no longer needed 

11 if {{ema:. < q) and (j = 6 • 2«-™)) 

12 then call create_grid(£max+l) 
13 

14 for integer £ from to £max 

15 { 

16 if (j mod 2' = 0) 

17 then call lntegrate_sllce(f, j) 

18 } 



19 } 

20 } 
21 

22 # create the auxiliary grid A^*' and update £,nax 

23 procedure create-grld(integer I) 

24 { 

25 Create the auxihary grid A^^' with spacing d = 2^ and size aw. imax — the size of the main grid 

26 Aq <— physical boundary data on the southwest grid face 

27 if {e > 0) 

28 then { 

29 A^'' ^ subsample from A'^'-i' 

30 A<^' ^ subsample from A'^-^) 

31 } 

32 ^max * ^ 

33 } 
34 

35 # integrate the auxiliary grid A^'^^ on the slice j 



36 procedure integrate_slice(integer £, integer j) 

37 { 

38 A^'(* ^ physical boundary data on the southeast grid face 

39 if (C = 0) 

40 then { 

41 for integer i from 1 to A^^^.imax 

42 { 

43 Afl ^ update using the globally-2nd-order scheme 

44 } 

45 } 

46 else { 

47 integer 5 <— 2* 

48 a'*J ^ subsample from A<'-1' 

49 A**^^^ ^ subsample from A^'-'^'l 

50 for integer i from 3i5 to A(''.l„;„ by S 

51 { 

52 Afl ^ update using the globally-4th-order scheme 

53 } ' 

54 } 

55 } 



Fig. 12 This figure gives an outUne of the recursive-doubling algorithm for constructing extended initial data 
for the globally-4-th-order initial data scheme. Coordinates for all grids are measured in units of the finest 
(A'"') auxiliary grid spacing, relative to the south corner of the problem domain. Thus, for example, the grid 
aP) has spacing 2^ = 8 and uses grid-point indices {0,8,16,24,32,40,... }. 
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The Newton's-method solution is moderately expensive for a computation which (logically) is needed at 
each grid point: it typically requires 3-10 iterations, with each iteration needing an expO computation and 
several other floating-point arithmetic operations. Fortunately, within any single grid r depends only on j — i, 
so in a unigrid code it's easy to precompute r for all possible values of j—i (of which there are only iff(N) 
for an NxN grid) when the grid is first set up. For the slice-recursion algorithm a somewhat more dynamic 
"radius cache" of r coordinates is needed, with updates each time regridding grows, shrinks, or relocates a 
grid. However, the set of j—i involved is still always a contiguous interval, so the cache bookkeeping overhead 
(over and above the storage arrays for the r coordinates themselves) is only ^(1) per grid. 



B.2 Local Coordinates for each Refinement Level 

Consider a single slice, and a pair of adjacent grid points in it at some refinement level (, say Gj ^ and G\ ^^j, 

viewed as events in spacetime. Since the grid spacing of G(*+*) is 2* times finer than that of G^, these same 
two events are necessarily 2* grid points apart in G'*+*' . This means that it's impossible to define integer grid- 
point coordinates which simultaneously (a) have adjacent grid points separated by 1 in the integer coordinates 
at each refinement level, and (b) assign a given event the same integer coordinates at each refinement level. 

In my AMR code I keep property (a), but discard property (b): each mesh-refinement level has its own 
local integer coordinate system for indexing grid points, and the code maintains expUcit fine-to-coarse and 
coarse-to-fine coordinate transformations between each pair {£,i+l) of adjacent refinement levels. These 
transformations are used when interpolating data from coarse to fine grids (discussed in detail in section B.3), 
when injecting fine-grid results back to coarse grids (lines 36-41 in figure 3), in setting up the "tail" re- 
integration (lines 43-48 of figure 3), and in checking the proper-nesting condition in regridding (lines 31—48 
of figure 4). This scheme has worked very well, and I recommend its use to others implementing Berger- 
OUger mesh-refinement codes. 

However, for the extended-initial-data algorithm of figure 12, it's convenient to use integer coordinates 
which keep property (b), but discard property (a). The Carpet code (Schnetter et al. (2004); Schnetter 
(2001)) also uses integer coordinates of this latter type. 
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* 2nd order unstable, 4th order unstable 

K 2nd order stable, 4th order unstable 

2nd order stable, 4th order stable 



Fig. 13 This figure shows the stability behavior of unigrid evolutions with varying i and A , using Gaussian 
initial data, no source term, and a problem domain size D = lOOM. Notice that for each I, the evolutions are 
always stable for A less than some threshold value. 
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B.3 Interpolation Operators 

As discussed in sections 5.1 and 5.3, the slice-recursion algorithm needs to interpolate data from coarse to 
fine grids in several situations: 

- When creating a new grid G'*', the first few slices (1 [3] slices for globally 2nd [4th] order finite dif- 
ferencing) of the newly-created grid are initialized by interpolating from the next coarser grid G'*~'' 
(line 20 of figure 4; figure 5c). 

- When time-integrating any grid G^*' finer than the base grid, the extended initial data on each new 
GW slice (i.e., the first 1 [3] points on the slice for globally 2nd [4th] order finite differencing), must be 
interpolated from the next coarser grid G'*~'' (line 9 of figure 3; figure 5c,d). before the main integration 
of the slice can be started. 

- When moving an existing grid G^'^' to a new position in the current v = constant sUce, newly-created 
points are initialized by interpolating from the next coarser grid G(*-i) (lines 25-26 of figure 4). 

As shown in figure 14, the precise choice of interpolation operator (which is made independently at 
each G'*^ grid point) depends on the relative position of the G''' interpolation point with respect to the next 
coarser grid G'^~''. All the interpolation operators considered here are Lagrange polynomial interpolants, 
which assume smoothness, so if the interpolation position is within a few grid points of the particle woridUne 
(where is only c'), then a different interpolation operator needs to be chosen so as to avoid crossing the 
particle worldline: 

- If the interpolation point coincides with a coarse-grid (G''"'') point, then the "interpolation" is just a 

copy of the data. 

- Otherwise, if the interpolation point's time (v) coordinate coincides with that of a coarse-g rid(G<''-'')v = 
constant slice, then the interpolation is a 1 -dimensional Lagrange polynomial interpolation in space (u) 
within this slice, using 4 [6] points for globally 2nd [4th] order finite differencing. The interpolation is 
constrained not to cross the particle worldline and not to use data from outside the spatial («) extent of 
the coarse slice. The interpolation is centered if this is possible within these constraints, otherwise it's as 
minimally off-centered as is necessary to satisfy them. 

- Otherwise, if the interpolation point's spatial (u) coordinate coincides with that of a coarse- grid (G(^-i)) 
u = constant Hne of grid points, then depending on the relative position of the interpolation point and the 
particle worldline, there are two cases: 

1. If the interpolation point is not close to the particle worldline, then the interpolation is a 1 -dimensional 
Lagrange polynomial interpolation in time (v) within the u = constant line of coarse-g rid (G(<-1)) 
points, again using 4 [6] points for globally 2nd [4th] order finite differencing. The set of input 
points for this interpolation is always the most recent 4(6) slices of the coarse grid(G('-')). 

2. Alternatively, if the interpolation point is too close to the particle worldline (i.e., if the 1-dimensional 
Lagrange polynomial interpolation molecule of case 1 would cross the particle worldline), then the 
interpolation is a 2-dimensional Lagrange polynomial interpolation in spacetime, chosen so as to 
not cross the particle worldUne. This case is described further below. 

- Otherwise (i.e., if the interpolation point lies in the center of a coarse-grid (G^^"'') cell), the interpolation 
is a 2-dimensional Lagrange polynomial interpolation in spacetime, again chosen so as to not cross the 
particle worldline. This case is described further below. 

While it is straightforward to construct Lagrange polynomial interpolation operators in 1 dimension, 
doing so in 2 (or more) dimensions is more difficult. The basic concept is the same - an interpolating poly- 
nomial is matched to the known grid function values at some set of molecule points, then evaluated at the 
interpolation point - but there are several complications. 

In 1 dimension the choice of interpolating polynomial is obvious, but in multiple dimensions different 
choices are possible. That is, let (v,, be a fixed reference point somewhere near the interpolation point 
(v, u), and define the relative coordinates x = v, — v and y = Ut — u. Then an nth degree interpolating polyno- 
mial in X and y might reasonably be defined as either 

f{x,y)= £ ap,x'y (27) 

0<p+q<n 
P>0,q>0 

or as 

f{x,y)= £ apgx!'f. (28) 

0<;j<n 

0<q<n 

In my code I (somewhat arbitrarily) always use interpolating polynomials of the form (27). I use fi = 3 [5] for 
4th [6th] order LTE (corresponding to 2nd [4th] order GTE). 



35 



Given the choice of an interpolating polynomial, there are still many different molecules possible, even 
given the requirement that the values of the interpolating polynomial at the molecule points uniquely de- 
termine the polynomial coefficients. I have used the Maple symbolic algebra system (Char et al. (1983, 
version 11, http : //www.maplesof t . com/)) to experiment with different interpolation molecules and to 
compute their coefficients. Figure 15 summarizes the set of spacetime-interpolation molecules used in my 
code. The actual coefficients may be obtained from the Maple output files in the sf evol/coef f / direc- 
tory of the source code included in the electronic supplementary materials accompanying this article (online 
resource 2). 



B.4 Data Structures 

As noted earlier in this paper, the largest practical obstacle to the use of Berger-Oliger mesh refinement 
algorithms is the complexity of progranmiing, debugging, and testing them. To help reduce this complexity 
for other researchers, here I briefly outline the main data structures and debugging/testing strategies I have 
found useful in implementing the slice-recursion algorithm. 
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Q coarse-grid point 

• fine-grid point where data is copied from the coarse grid 

fin(!-grid point where data is space-interpolated from the coarse grid 

^ finc-gri{i i^oint where data is time-interpolated from the coarse grid 
(Hpacetime-interpolated when near the particle worldline) 



fine-grid point where data is spacetime-interpolated from the coarse grid 
(using variant molecules when near the particle worldline) 



Fig. 14 This figure shows the type of interpolation operator used for each possible relative position of a fine- 
grid point with respect to the next coarser grid. The space-interpolation and time-interpolation operators are 
described in the text. The spacetime-interpolation molecules are shown in figure 15. 
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cubic intcjrpolatioii: quintic interpolation: 

far from the particle far from the particle 
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cubic interpolation: near to, 
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Fig. 15 This figure shows the spacetime-interpolation molecules. In each subfigure, the open circles show 
the input points of the interpolation molecule (some of which may have zero weight in any given molecule), 
the solid circles show the various interpolation points, and in the lower 4 subfigures, the dashed line shows 
the particle worldline. 
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A SLICE object represents a single Gj slice at a single refinement level, i.e., it stores all the grid func- 
tions needed to represent the solution of the PDEs on that slice. In my code, SLICE is a C++ template with 
the template parameter selecting the PDE system (e.g., real or complex scalar field) to be supported. 

A CHUNK object stores enough adjacent slices at a single refinement level to be able to take time steps, 
i.e., it stores 4 [7] adjacent SLICE objects for the 2nd [4th] order GTE finite differencing schemes described 
in this paper. CHUNK also maintains the radius cache discussed in appendix B.l. In my code, CHUNK is a 
C++ template with 2 template parameters, one selecting the PDE system and the other selecting the finite 
differencing scheme (2nd versus 4th order GTE) and thus implicitly the number of adjacent slices to be 
stored. To avoid unnecessary data copying, at each time step CHUNK circularly rotates pointers to a fixed 
set of SLICE objects. When debugging the code, CHUNK (and SLICE) can be thoroughly tested using unigrid 
evolutions. 

A MESH object represents an entire grid hierarchy as described in section 5.1. That is, a MESH object 
stores a stack of CHUNK objects, one for each refinement level, together with the necessary bookkeeping 
information to compute the fine-to-coarse and coarse-to-fine coordinate transformations described in ap- 
pendix B.2. When debugging the code, mesh can be tested by manually creating a grid hierarchy and testing 
that the expected results are obtained for various operations on it such as adding a new refinement level, 
dropping a refinement level, moving the chunk at some refinement level to a new location, interpolating or 
copying data from one refinement level to another, or transforming the per-refinement-level coordinates from 
one refinement level to another. 

Finally, the actual slice-recursion Berger-Oliger and regridding algorithms are implemented in terms of 
the various MESH operations. I found it difficult to thoroughly test the Berger-Oliger and regridding logic, but 
this comprises a relatively small body of code - most of the overall complexity of the software lies in mesh 
and lower-level code, which is relatively straightforward to test. 



